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Functional nnagnetic resonance innaging (fMRI) studies involve substantial acoustic noise. 
This review covers the difficulties posed by such noise for auditory neuroscience, as 
well as a nunnber of possible solutions that have emerged. Acoustic noise can affect the 
processing of auditory stinnuli by making them inaudible or unintelligible, and can result in 
reduced sensitivity to auditory activation in auditory cortex. Equally importantly, acoustic 
noise may also lead to increased listening effort, meaning that even when auditory stimuli 
are perceived, neural processing may differ from when the same stimuli are presented in 
quiet. These and other challenges have motivated a number of approaches for collecting 
auditory fMRI data. Although using a continuous echoplanar imaging (EPI) sequence 
provides high quality imaging data, these data may also be contaminated by background 
acoustic noise. Traditional sparse imaging has the advantage of avoiding acoustic noise 
during stimulus presentation, but at a cost of reduced temporal resolution. Recently, 
three classes of techniques have been developed to circumvent these limitations. The 
first is Interleaved Silent Steady State (ISSS) imaging, a variation of sparse imaging 
that involves collecting multiple volumes following a silent period while maintaining 
steady-state longitudinal magnetization. The second involves active noise control to limit 
the impact of acoustic scanner noise. Finally, novel MR! sequences that reduce the 
amount of acoustic noise produced during fMRI make the use of continuous scanning 
a more practical option. Together these advances provide unprecedented opportunities 
for researchers to collect high-quality data of hemodynamic responses to auditory stimuli 
using fMRI. 
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INTRODUCTION 

Over the past 20 years, functional magnetic resonance imag- 
ing (fMRI) has become the workhorse of cognitive scientists 
interested in noninvasively measuring locaUzed human brain 
activity Although the benefits provided by fMRI have been 
substantial, there are numerous ways in which it remains an 
imperfect technique. This is perhaps nowhere more true than 
in the field of auditory neuroscience due to the substantial 
acoustic noise generated by standard fMRI sequences. In order 
to study brain function using fMRI, auditory researchers face 
what can seem like an unappealing array of methodological 
decisions that impact the acoustic soundscape, cognitive per- 
formance, and imaging data characteristics to varying degrees. 
Here I review the challenges faced in auditory fMRI stud- 
ies, possible solutions, and prospects for future improvement. 
Much of the information regarding the basic mechanics of 
noise in fMRI can be found in previous reviews (Amaro et al., 
2002; Moelker and Pattynama, 2003; Talavage et al, 2014); 
although I have repeated the main points for completeness, I 
focus on more recent theoretical perspectives and methodological 
advances. 



SOURCES OF ACOUSTIC INTERFERENCE IN fMRI 

Table 1 summarizes several factors that contribute to the degra- 
dation of acoustic signals during fMRI. Echoplanar imaging (EPI) 
sequences commonly used to detect the blood oxygen level depen- 
dent (BOLD) signal in fMRI require radiofrequency (RF) pulses 
that excite tissue and gradient coils that help encode spatial posi- 
tion by altering the local magnetic field. During EPI the gradient 
coils switch between phase encoding and readout currents, pro- 
ducing Lorentz forces that act on the coils and connecting wires. 
These vibrations travel as compressional waves through the scan- 
ner hardware and eventually enter the air as acoustic sound. 
This gradient-induced vibration produces the most prominent 
acoustic noise during fMRI, and can continue for up to approxi- 
mately 0.5 s after the gradient activity ceases (Ravicz et al., 2000). 
Because the Lorentz force is proportional to the main magnetic 
field strength (Bq) and the gradient current, both high Bq and 
high gradient amplitudes generally increase the amount of acous- 
tic noise generated (Moelker et al., 2003). For example, increasing 
field strength from 0.2 to 3 T will bring maximum acoustic noise 
from ->85 to --130 dB SPL (Foster et al, 2000; Ravicz et al, 2000; 
Price etal, 2001). 
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FIGURE 1 I Even when subjects can hear stimuli, acoustic noise can 
impact neural activity through at least three pathways. First, acoustic 
noise from the scanner stinnulates the auditory pathway (including auditory 
cortex), reducing sensitivity to experimental stimuli. Second, successfully 
processing degraded stimuli may require additional executive processes 
(such as verbal working memory or performance monitoring). These 
executive processes are frequently found to rely on regions of frontal and 
premotor cortex, as well as the cingulo-opercular network. Finally, scanner 
noise may increase attentional demands, even for non-auditory tasks, an 
effect that is likely exacerbated in more sensitive subject populations. 
Although the specific cognitive and neural consequences of these challenges 
may vary, the critical point is that scanner noise can alter both cognitive 
demand and the patterns of brain activity observed through multiple 
mechanisms, affecting both auditory and non-auditory brain networks. 



Table 1 j Sources of acoustic interference during fMRI. 



Source 


Approximate noise level (dB SPL) 


Gradient coils 


85-130 


Heliunn punnp and air circulating 


57-76 


in-ear foann earplugs 




Sub-optinnal headphones 





Although the noise generated by gradient switching is the most 
obvious (i.e., loudest) source of acoustic noise during fMRI, it 
is not the only source of acoustic interference. RF pulses con- 
tribute additional acoustic noise, and noise is also present as a 
result of air circulation systems and helium pumps in the range of 
57-76 dB SPL (Ravicz et al, 2000). Because RF and helium pump 
noise is substantially quieter than that generated by gradient coils 
it probably provides a negligible contribution when scanning is 
continuous, but may be more relevant in sparse or interleaved 
silent steady state (ISSS) imaging sequences (described in a later 
section) when gradient- switching noise is absent. Auditory clar- 
ity can also be reduced as a result of in-ear hearing protection and 
sub-optimal headphone systems. 

Separately or together, these noise sources provide a level of 
acoustic interference that is significantly higher than that found 
in a typical behavioral testing environment. In the next section I 
turn to the more interesting question of the various ways in which 
this cacophony may impact auditory neuroscience. 

CHALLENGES OF ACOUSTIC NOISE IN AUDITORY fMRI 

Acoustic noise can influence neural response through at least 
three independent pathways, illustrated schematically in Figure 1 . 
The effects will vary depending on the specific stimuli, population 
being studied, and brain networks being examined. Importantly, 
though, in many cases the impact of noise on brain activation can 
be seen outside of auditory cortex. In this section I review the 
most pertinent challenges caused by acoustic scanner noise. 

ENERGETIC MASKING 

Energetic masking refers to the masking of a target sound by a 
noise or distractor sound that obscures information in the tar- 
get. That is, interference occurs at a peripheral level of processing, 
with the masker already obscuring the target as the sound enters 
the eardrum (and thus at the most peripheral levels of the audi- 
tory system). The level of masking is often characterized by the 
signal-to-noise ratio (SNR), which reflects the relative loudness 
of the signal and masker. For example, an SNR of +5 dB indi- 
cates that on average the target signal is 5dB louder than the 
masker. If scanner noise at a subject's ear is 80 dB SPL, achieving a 
moderately clear SNR of +5 would require presenting a target sig- 
nal at 85 dB SPL. When considering the masking effects of noise 
it is important to note that the characteristics of the noise are 
also important: noise that has temporal modulation can permit 
listeners to glean information from the "dips" in the noise masker. 

Energetic masking highlights the most obvious challenge of 
using auditory stimuli in fMRI: Subjects may not be able to 
perceive auditory stimuli due to scanner noise. If stimuli are 
inaudible — or less than fully perceived in some way — interpreting 
the subsequent neural responses can be problematic. A different 



(but related) sort of energetic masking challenge arises in experi- 
ments in which subjects are required to make vocal responses, as 
scanner noise can interfere with an experimenter's understand- 
ing of subject responses; in some cases this can be ameliorated by 
offline noise reduction approaches (e.g., Cusack et al., 2005). In 
addition, the presence of acoustic noise may also change the qual- 
ity of vocalizations produced by subjects (Junqua, 1996). Acoustic 
noise thus impacts not only auditory perception, but speech 
production, which may be important for some experimental 
paradigms. 

Two ways of ascertaining the degree to which energetic mask- 
ing is a problem are (1) to ask participants about their subjective 
experience hearing stimuli or (2) to include a discrimination or 
recall test that can empirically verify the degree to which audi- 
tory stimuli are perceived. Given individual differences in hearing 
level and ability to comprehend stimuli in noise, these are likely 
best done for each subject, rather than, for example, audibility 
being verified solely by the experimenter. It is also important to 
test audibility using stimuli representative of those used in the 
experiment, as the masking effects of scanner noise can be influ- 
enced by specific acoustic characteristics of the target stimuli (for 
example, being more detrimental to perception of birdsong than 
speech). 
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Although it is naturally important for subjects to be able to 
hear experimental stimuli (and for experimenters to hear subject 
responses, if necessary), the requirement of audibility is obvi- 
ous enough that it is often taken into account when designing 
a study. However, acoustic noise may also cause more pernicious 
challenges, to which I turn in the following sections. 

AUDITORY ACTIVATION 

A natural concern regarding acoustic noise during fMRI relates to 
the activation along the auditory pathway resulting from the scan- 
ner noise. If brain activity is modulated in response to scanner 
noise, might this reduce our ability to detect signals of interest? 
To investigate the effect of scanner noise on auditory activation, 
Bandettini et al. (1998) acquired data with and without EPI-based 
acoustic stimulation, enabling them to compare brain activity 
that could be attributed to scanner noise. They found that scanner 
noise results in increased activity bilaterally in superior temporal 
cortex (see also Talavage et al, 1999). Notably, this activity was 
not observed only in primary auditory cortex, but in secondary 
auditory regions as well. The timecourse of activation to scanner 
noise peaks 4-5 s after stimulus onset, returning to baseline by 
9-12 s (Hall et al, 2000), and is thus comparable to that observed 
in other regions of cortex (Aguirre et al, 1998). Scanner- related 
activation in primary and secondary auditory cortex limits the 
dynamic range of these regions, producing weaker responses to 
auditory stimuli (Shah et al, 1999; Talavage and Edmister, 2004; 
Langers et al, 2005; Gaab et al, 2007). In addition to overall 
changes in magnitude or spatial extent of auditory activation, 
scanner noise can affect the level at which stimuli need to be pre- 
sented for audibility, which can in turn affect activity down to 
the level of tonotopic organization (Langers and van Dijk, 2012). 
Thus, if activity along the auditory pathway proper is of interest, 
the contribution of scanner noise must be carefully considered 
when interpreting results. 

It is worth noting that while previous studies have investigated 
the effect of scanner noise on overall (univariate) response mag- 
nitude, the degree to which this overall change in gain may affect 
multivariate analyses is unclear. Again, this is true for activity 
in both auditory cortex and regions further along the auditory 
processing hierarchy (Davis and Johnsrude, 2007; Peelle et al., 
2010b). 

COGNITIVE EFFORT DURING AUDITORY PROCESSING 

Although acoustic noise can potentially affect all auditory pro- 
cessing, most of the research on the cognitive effects of acoustic 
challenge has occurred in the context of speech comprehen- 
sion. There is increasing consensus that decreased acoustic clarity 
requires listeners to engage additional cognitive processing to suc- 
cessfully understand spoken language. For example, after hearing 
a list of spoken words, memory is worse for words presented in 
noise, even though the words themselves are intelligible (Rabbitt, 
1968). When some words are presented in noise (but are still 
intelligible), subjects have difficulty remembering not only the 
words in noise, but prior words (Rabbitt, 1968; Cousins et al., 
2014), suggesting an increase in cognitive processing for degraded 
speech that lasts longer than the degraded stimulus itself and 
interferes with memory (Miller and Wingfield, 2010). Additional 



evidence supporting the link between acoustic challenge and 
cognitive resources comes from pupillometry (Kuchinsky et al, 
2013; Zekveld and Kramer, 2014) and visual tasks which relate 
to individual differences in speech perception ability (Zekveld 
et al., 2007; Besser et al., 2012). The additional cognitive resources 
required are not specific to acoustic processing but appear to 
reflect more domain- general processes (such as verbal working 
memory) recruited to help with auditory processing (Wingfield 
et al., 2005; Ronnberg et al., 2013). Thus, acoustic challenge can 
indirectly impact a wide range of cognitive operations. 

Consistent with this shared resource view, behavioral effects of 
acoustic clarity are reliably found on a variety of tasks. Van Engen 
et al. (2012) compared listeners' recognition memory for sen- 
tences spoken in conversational speech compared to those spoken 
in a clear speaking style (with accentuated acoustic features), 
and found that memory was superior for the acoustically- clearer 
sentences. Likewise, listeners facing acoustic challenge — due to 
background noise, degraded speech, or hearing impairment — 
perform poorer than listeners with normal hearing on auditory 
tasks ranging from sentence processing to episodic memory tasks 
(Pichora-Fuller et al, 1995; Surprenant, 1999; Murphy et al, 
2000; McCoy et al, 2005; Tun et al, 2010; Heinrich and Schneider, 
2011; Lash et al, 2013). 

Converging evidence for the neural effects of effortful listen- 
ing comes from flVlRI studies in which increased neural activ- 
ity is seen for degraded speech relative to unprocessed speech 
(Scott and McGettigan, 2013), illustrated in Figure 2. Davis and 
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FIGURE 2 I Listening to degraded speech requires increased reliance 
on executive processing and a more extensive network of brain 
regions. When speech clarity is high, neural activity is largely confined to 
traditional frontotennporal "language" regions including bilateral tennporal 
cortex and left inferior frontal gyrus. When speech clarity is reduced, 
additional activity is frequently seen in frontal cortex, including middle 
frontal gyrus, prennotor cortex, and the cingulo-opercular network 
(consisting of bilateral frontal operculunn and anterior insula, as well as 
dorsal anterior cingulate) (Dosenbach et al., 2008). 
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Johnsrude (2003) presented listeners with sentences that varied 
in their intelHgibility, with speech clarity ranging from unin- 
telligible to fully intelligible. They found greater activity for 
degraded speech compared to fully intelligible speech in the left 
hemisphere, along both left superior temporal gyrus and infe- 
rior frontal cortex. Importantly, increased activity in frontal and 
prefrontal cortex was greater for moderately distorted speech 
than either fully intelligible or fully unintelligible speech (i.e., 
an inverted U-shaped function), consistent with its involve- 
ment in recovering meaning from degraded speech (as distinct 
from a simple acoustic response). Acoustic clarity (i.e., SNR) 
also impacts the brain networks supporting semantic process- 
ing during sentence comprehension (Davis et al, 2011), possibly 
reflecting increased use of semantic context as top-down knowl- 
edge during degraded speech processing (Obleser et al., 2007; 
Obleser and Kotz, 2010; Sohoglu et al, 2012). 

Additional studies using various forms of degraded speech 
have also found difficulty- related increases in regions often asso- 
ciated with cognitive control or performance monitoring, such as 
bilateral insula and anterior cingulate cortex (Eckert et al., 2009; 
Adank, 2012; Wild et al, 2012; Erb et al, 2013; Vaden et al., 2013). 
The stimuli used in these studies are typically less intelligible than 
unprocessed speech (e.g., 4- or 6-channel vocoded^ speech, or 
low-pass filtered speech). Thus, although the increased recruit- 
ment of cognitive and neural resources to handle degraded speech 
is frequently observed, the specific cognitive processes engaged — 
and thus the pattern of neural activity — depend on the degree 
of acoustic challenge presented. An implication of this variabil- 
ity is that it may be hard to predict a priori the effect of acoustic 
challenge on the particular cognitive system(s) of interest. 

In summary, there is clear evidence that listening to degraded 
speech results in increased cognitive demand and altered pat- 
terns of brain activity. The specific differences in neural activity 
depend on the degree of the acoustic challenge, and thus may 
differ between moderate levels of degradation (when compre- 
hension accuracy remains high and few errors are made) and 
more severe levels of degradation (when comprehension is sig- 
nificantly decreased). It is important to note that effort-related 
differences in brain activity can be seen both within the classic 
speech comprehension network and in regions less typically asso- 
ciated with speech comprehension, and depend on the nature of 
both the stimuli and the task. Furthermore, the way in which 
these effort- related increases interact with other task manipula- 
tions has received little empirical attention, and thus the degree 
to which background noise may influence observed patterns of 
neural response for many specific tasks is largely unknown. 

Finally, although most of the research on listening effort 
has been focused on speech comprehension, it is reasonable to 
think that many of these same principles might transfer to other 
auditory domains, such as music or environmental sounds. And, 



^ Noise vocoding (Shannon et al, 1995) involves dividing the frequency 
spectrum of a stimulus into bands, or channels. Within each channel, the 
amplitude envelope is extracted and used to modulate broadband noise. Thus, 
the number of channels determines the spectral detail present in a speech sig- 
nal, with more channels resulting in a more detailed (and for speech, more 
intelligible) signal (see Figure 2 in Peelle and Davis, 2012). 



as covered in the next section, effects of acoustic challenge need 
not even be limited to auditory tasks. 

EFFECTS OF ACOUSTIC NOISE IN NON-AUDITORY TASKS 

Although the interference caused by acoustic noise is most obvi- 
ous when considering auditory tasks, it may also affect subjects' 
performance on non-auditory tasks (for example, by increas- 
ing demands on attention systems). The degree to which noise 
impacts non- auditory tasks is an important one for cognitive neu- 
roscience. Unfortunately, there have been relatively few studies 
addressing this topic directly. 

Using continuous EPI, Cho et al. (1998a) had subjects perform 
simple tasks in the visual (flickering checkerboard) and motor 
(finger tapping) domains, with and without additional scanner 
noise played through headphones. The authors found opposite 
effects in visual and motor modalities: activity in visual cortex was 
increased with added acoustic noise, whereas activity in motor 
cortex was reduced. 

To investigate the effect of scanner noise on verbal working 
memory, Tomasi et al. (2005) had participants to perform an 
n-back task using visually- displayed letters. The loudness of the 
EPI scanning was varied by approximately 12 dB by selecting 
two readout bandwidths to minimize (or maximize) the acous- 
tic noise. No difference in behavioral accuracy was observed as 
a function of noise level. However, although the overall spatial 
patterns of task- related activity were similar, brain activity dif- 
fered as a function of noise. The louder sequence was associated 
with increased activity in several regions including large portions 
of (primarily dorsal) frontal cortex and cerebellum, and the qui- 
eter sequence was associated with greater activity in (primarily 
ventral) regions of frontal cortex and left temporal cortex. 

Behaviorally, recorded scanner noise has been shown to impact 
cognitive control (Hommel et al., 2012); additional effects of 
scanner noise have been reported in fMRI tasks of emotional pro- 
cessing (Skouras et al, 2013) and visual mental imagery (Mazard 
et al., 2002). Thus, MRI acoustic noise influences brain function 
across a number of cognitive domains. 

It is not only the loudness of scanner noise that is an issue, but 
also the characteristics of the sound: whether an acoustic stimu- 
lus is pulsed or continuous, for example, can significantly impact 
both auditory and attentional processes. Haller et al. (2005) had 
participants perform a visual n-back task, using either a conven- 
tional EPI sequence or one with a continuous sound (i.e., not 
pulsed). Although behavioral performance did not differ across 
sequence, there were numerous differences in the detected neural 
response. These included greater activity in cingulate and por- 
tions of frontal cortex for the conventional EPI sequence, but 
greater activity in other portions of frontal cortex and left mid- 
dle temporal gyrus for the continuous noise sequence. As with 
conventional EPI sequences, scanner noise is once again found 
to impact neural processing in areas beyond auditory cortex (see 
also Haller et al, 2009). 

It is worth noting that not every study investigating this issue 
has observed effects of acoustic noise in non-auditory tasks: 
Elliott et al. (1999), using participants performing visual, motor, 
and auditory tasks, found that scanner noise resulted in decreased 
activity uniquely during the auditory condition. Nevertheless, the 
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number of instances in which scanner noise has been found to 
affect neural activity on non- auditory tasks is high enough that 
the issue should be taken seriously: Although exactly how much 
of the difference in neural response can be attributed to scanner 
noise is debatable, converging evidence indicates that the effects 
of scanner noise frequently extend beyond auditory cortex (and 
auditory tasks). These studies suggest that (1) a lack of behav- 
ioral effect of scanner noise does not guarantee equivalent neural 
processing; (2) both increases and decreases in neural activity are 
seen in response to scanner noise; and (3) the specific regions in 
which noise-related effects are observed vary across study. 

OVERALL SUBJECT COMFORT AND SPECIAL POPULATIONS 

An additional concern regarding scanner noise is that it may 
increase participant discomfort. Indeed, acoustic noise can cause 
anxiety in human subjects (Quirk et al., 1989; Melendez and 
McCrank, 1993), a finding which may also extend to animals. 
Scanner noise presents more of a challenge for some subjects 
than others, and it may be possible to improve the comfort of 
research subjects (and hopefully their performance) by reducing 
the amount of noise during MRI scanning. Additionally, if pop- 
ulations of subjects differ in a cognitive ability such as auditory 
attention, the presence of scanner noise may affect one group 
more than another. For example, age can significantly impact the 
degree to which subjects are bothered by environmental noise 
(Van Gerven et al., 2009); similarly, individual differences in noise 
sensitivity may contribute to (or reflect) variability in the effects 
of scanner noise on neural response (Pripfl et al, 2006). These 
concerns may be particularly relevant in clinical or developmental 
studies with children, participants with anxiety or other psychi- 
atric condition, or participants who are particularly bothered by 
auditory stimulation. 

A CAUTIONARY NOTE REGARDING INTERACTIONS 

One argument sometimes made in auditory fMRI studies using 
standard EPI sequences is that although acoustic noise may have 
some overall impact, because noise is present during all experi- 
mental conditions it cannot influence the results when comparing 
across conditions (which is often of most scientific interest). 
Given the ample amount of evidence for auditory- cognitive inter- 
actions, such an assumption seems tenuous at best. If anything. 



there is good reason to suspect interactions between acoustic 
noise and task difficulty, which may manifest differently depend- 
ing on particular stimuli, listeners, and statistical methods (for 
example, univariate vs. multivariate analyses). In the absence of 
empirical support to the contrary, claims that acoustic noise is 
unimportant should be treated with skepticism. 

SOLUTIONS FOR AUDITORY fMRI 

Although at this point the prospects for auditory neuroscience 
inside an MRI scanner may look bleak, there is still cause for opti- 
mism. In this section I provide an overview of several methods 
for dealing with scanner noise that have been employed, noting 
advantages and disadvantages of each. These approaches are listed 
in Table 2, a subset of which is shown in Figure 3. 

PASSIVE HEARING PROTECTION 

Subjects in MRI studies typically wear over-ear hearing protection 
that attenuates acoustic noise by approximately 30 dB. Subjects 
may also wear insert earphones, or foam earplugs that can pro- 
vide additional reduction in acoustic noise of 25-28 dB, for a 
combined reduction of approximately 40 dB (Ravicz and Melcher, 
2001). Although hearing protection can reduce the acoustic noise 
perceived during MRI, it cannot eliminate it completely: Even if 
perfect acoustic isolation could be achieved at the outer ear, sound 
waves still travel to the cochlea through bone conduction. Thus, 
hearing protection is only partial solution, and some degree of 
auditory stimulation during conventional fMRI is unavoidable. 
In addition, passive hearing protection may change the frequency 
spectrum of stimuli, affecting intelligibility or clarity. 

CONTINUOUS SCANNING USING A STANDARD EPI SEQUENCE 

One approach in auditory fMRI is to present stimuli using 
a conventional continuous scanning paradigm, taking care to 
ensure that participants are able to adequately hear the stim- 
uli (Figure 4A). This approach generally assumes that, because 
scanning noise is consistent across experimental condition, it is 
unlikely to systematically affect comparisons among conditions 
(typically what is of interest). I have already noted above the dan- 
ger of this assumption with respect to additional task effects and 
ubiquitous interactions between perceptual and cognitive factors. 
However, for some paradigms a continuous scanning paradigm 



Table 2 | Methods for dealing with acoustic noise in fMRI. 



Approach 


Approximate noise Requires custom 
reduction during scanner 
stimulus (dB)^ hardware? 


Requires custom 

presentation 

equipment? 


Requires custom 
MRI sequence? 


Image quality 
relative to 
continuous 


Temporal 
resolution relative 
to continuous 


Continuous EPI 


0 


No 


No 


No 






Passive hearing protection 


35 


No 


No 


No 


No change 


No change 


Sparse innaging 


50 


No 


No 


No 


No change 


Reduced 


ISSS innaging 


50 


No 


No 


Yes 


No change 


Slightly reduced 


Active noise control 


40 


No 


Yes 


No 


No change 


No change 


Quiet MRI sequences 


20 


No 


No 


Yes 


Reduced 


Slightly reduced 


Scanner hardware nnodification 


20 


Yes 


No 


No 


No change 


No change 



^ The actual reduction of acoustic noise can vary substantially depending on the specific equipment and implementation; these numbers are provided as a rough 
estimate. 
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FIGURE 3 I Schematic illustration of the relationship between 
temporal resolution and acoustic noise during stimulus presentation 
for various MRI acquisition approaches. Although the details for any 
specific acquisition depend on a combination of nnany factors, in general 
significant reductions in acoustic noise are associated with poorer tennporal 
resolution. 



may be acceptable. From an imaging perspective continuous 
imaging will generally provide the largest quantity of data, and 
no special considerations are necessary when analyzing the data. 
Continuous EPI scanning has been used in countless studies to 
identify brain networks responding to environmental sounds, 
speech, and music. The critical question is whether the cogni- 
tive processes being imaged are actually the ones in which the 
experimenter is interested^. 

SPARSE IMAGING 

When researchers are concerned about acoustic noise in fMRI, by 
far the most widely used approach is sparse imaging, also referred 
to as clustered volume acquisition (Scheffier et al, 1998; Eden 
et al, 1999; Edmister et al, 1999; Hall et al, 1999; Talavage and 
Hall, 2012). In sparse imaging, illustrated in Figure 4B, the repeti- 
tion time (TR) is set to be longer than the acquisition time (TA) of 
a single volume. Slice acquisition is clustered toward the end of a 
TR, leaving a period in which no data are collected. This interven- 
ing period is relatively quiet due to the lack of gradient switching, 
and permits stimuli to be presented in more favorable acous- 
tic conditions. Because of the inherent lag of the hemodynamic 
response (typically 4-7 s to peak), the scan following stimulus 
presentation can still measure responses to stimuli, including the 
peak response if presentation is timed appropriately. 



^For researchers who question whether acoustic noise during fMRI may 
impact cognitive processing, it may be interesting to suggest to a cognitive 
psychologist that they play 100 dB SPL sounds during their next behavioral 
study and gauge their enthusiasm. 



The primary disadvantage of sparse imaging is that due to the 
longer TR, less information is available about the timecourse of 
the response (i.e., there is a lower sampling rate). In addition to 
reducing the accuracy of the response estimate, the reduced sam- 
pling rate also means that differences in timing of response may 
be interpreted as differences in magnitude. An example of this is 
shown in Figure 4B, in which hemodynamic responses that differ 
in magnitude and timing will give different results, depending on 
the time at which the response is sampled. 

The lack of timecourse information in sparse imaging can be 
ameliorated in part by systematically varying the delay between 
the stimulus and volume collection (Robson et al., 1998; Belin 
et al., 1999), illustrated in Figure 4D. In this way, the hemody- 
namic response can be sampled at multiple time points relative to 
stimulus onset over different trials. Thus, across trials, an accurate 
temporal profile for each category of stimulus can be estimated. 
Like all event- related fMRI analyses this approach assumes a con- 
sistent response for all stimuli in a given category. It also may 
require prohibitively long periods of scanning to sample each 
stimulus at multiple points; this requirement has meant that in 
practice varying presentation times relative to data collection is 
done infrequently. 

Many studies incorporating sparse imaging use an event- 
related design, along with TRs in the neighborhood of 16 s or 
greater, in order to allow scanner-induced BOLD response to 
return to near baseline levels on each trial. Although this may 
be particularly helpful for experiments in which activity in pri- 
mary auditory areas is of interest, it is not necessary for all studies, 
and in principle sparse designs can use significantly shorter TRs 
(e.g., <5 s). Sometimes referred to as "fast" sparse designs, sparse 
designs with shorter TRs enable researchers to take advantage of a 
faster stimulus presentation rate and acquire more data for a given 
period of time, and for many experiments may be a more efficient 
approach (Perrachione and Ghosh, 2013). 

Cardiac gating 

Cardiac gating addresses problems caused by the fact that heart- 
beat and associated changes in blood flow can displace brainstem 
structures, making activity in these regions difficult to detect. 
With cardiac gating, researchers monitor a subject's heart rate, 
and then adjust volume acquisition to be synchronized to the 
heart rate (i.e., occurring at a consistent time in the heart rate 
cycle) (Guimaraes et al, 1998). Because heart rate will not per- 
fectly align with a chosen TR, using cardiac gating results in a 
variable TR (zb approximately Vi heart rate). (With relatively long 
TRs, the variability in sampling rate is typically not a significant 
problem, as the response to one trial is unlikely to overlap the 
response to another trial). Cardiac gating reduces data variability 
due to cardiac pulse motion artifacts and can thus improve ability 
to detect activity in subcortical structures prone to these artifacts, 
such as the inferior colliculus and medial geniculate body (Harms 
and Melcher, 2002; Overath et al, 2012). 

INTERLEAVED SILENT STEADY STATE (ISSS) IMAGING 

The main disadvantages in traditional sparse imaging come from 
the lack of information about the timecourse of the hemody- 
namic response, and the relatively small amount of data collected 
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FIGURE 4 I Different approaches to imaging auditory stimuli provide 
varying compromises between temporal resolution and acoustic 
noise. Example BOLD responses are shown in blue and red; these 
could reflect different responses across individuals or experinnental 
conditions. (A) Continuous EPI provides relatively good tennporal 
resolution, but with a high level of continuous acoustic noise. (B) 
Sparse innaging includes a period in which no data is collected, allowing 
the presentation of stinnuli in relative quiet (due to the absence of 
gradient switching noise). The delay in the hennodynannic response 
enables the peak response to be collected after stinnulus presentation 
has finished. The reduced tennporal resolution of a traditional sparse 
innaging sequence nnay obscure differences in response latency or 
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shape. In the hypothetical exannple, the blue response peaks higher 
than the red response; however, at the tinne when the sparse data 
point is collected, the red response is higher. (C) With interleaved 
silent steady state (ISSS) innaging, stinnuli can also be presented in the 
absence of gradient switching noise, but a greater annount of data can 
be collected after presentation connpared to sparse innaging. The delay 
in the hennodynannic response enables peak responses to be collected 
with relatively good tennporal resolution. (D) By varying the tinne at 
which stinnuli are presented relative to data collection across trials, 
non-continuous innaging can still provide infornnation about the 
tinnecourse of the average response to a category of stinnuli. Note how 
a different part of the BOLD response is sampled on each trial. 



(leading to potentially less accurate parameter estimates and fewer 
degrees of freedom in first-level analyses). Although in principle 
multiple volumes can be acquired following each silent period, 
the equilibrium state of the brain tissue changes during these 
silent periods: The additional scans do not reflect steady-state 
longitudinal magnetization, and thus vary over time. The lack 
of steady- state longitudinal magnetization adds variance to the 
data that can be challenging to account for in timeseries statistical 
models. 

Schwarzbauer et al. (2006) developed a solution to this prob- 
lem by implementing a sequence with continuous excitation 
RF pulses, but variable readout gradients. The excitation pulses 
maintain steady state longitudinal magnetization but produce rel- 
atively little acoustic noise. As in traditional sparse imaging, an 
ISSS sequence permits stimuli to be presented in quiet and the 
peak BOLD activity to be captured due to the delay in hemody- 
namic response. However, with ISSS any number of volumes can 
be obtained following a silent period, as illustrated in Figure 4C. 
Although technically the temporal resolution is reduced relative 
to continuous scanning — as there are times when no data is being 
collected — the effective temporal resolution can be nearly as good 
as continuous scanning because data collection can capture much 
of the BOLD response following stimulus presentation: The abil- 
ity of the sequence to capture the early hemodynamic response 
is limited solely by the length of the stimuli (with shorter stim- 
uli permitting data collection to start closer to stimulus onset). 



ISSS thus combines advantages of continuous and sparse imag- 
ing, allowing the presentation of stimuli in relative quiet, while 
still providing information on the timing of the hemodynamic 
response. Variations of ISSS fMRI have now been used success- 
fully in numerous studies of auditory processing (Doehrmann 
et al, 2008, 2010; Bekinschtein et al, 2011; Davis et al, 2011; 
Engel and Keller, 201 1; Mueller et al, 201 1; Rodd et al, 2012; Yoo 
et aL,2012). 

Compared to continuous or sparse imaging data, ISSS data can 
be challenging to analyze because the data are discontinuous — 
that is, the sampling rate is not consistent. Because of this added 
wrinkle, below I briefly review two examples of analyzing ISSS 
data, illustrated in Figure 5. No doubt with increasing experience 
ISSS data analysis can be further refined. These descriptions are 
based on an imaginary event- related fMRI study with two condi- 
tions (A and B) and a TR of 2 s. Each trial involves presenting a 
single stimulus during a period of 4 s of silence, followed by 8 s of 
data acquisition. With a TR of 2 s, this results in 4 volumes of data 
per trial. 

Analyzing ISSS fMRI data using a finite impulse response (FIR) 
model 

Perhaps the most straightforward approach to analyzing ISSS 
fMRI data is to use a finite impulse response (FIR) model, 
shown in Figure 5A. A typical FIR model would consider only 
the scans on which data was collected. The model would thus 
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FIGURE 5 I Two examples of ISSS fMRI data analysis. The example 
experiment is illustrated in the top row and identical for both approaches. No 
data are collected during stimulus presentation; following each silent period 4 
scans are collected. (A) In the finite impulse response (FIR) model, scans are 
concatenated, and each time bin following an event is modeled using a 
separate regressor. The modeled scans have temporal discontinuities, but 
accurately represent all of the data collected. (B) By incorporating dummy 



scans in the modeled timeseries, the original temporal structure of the true 
data is preserved, facilitating the use of basis functions such as a canonical 
HRF. Regressors for experimental conditions should be set to 0 during the 
period of the dummy scans; the dummy scans themselves can be modeled 
with a single regressor. However, the modeled scans now overestimate the 
amount of data collected, artificially inflating the degrees of freedom in 
single-subject (first-level) models. 



have 4 regressors for condition A (one for each volume following 
stimulus presentation), and 4 regressors for condition B. These 
regressors would model the response at each time bin follow- 
ing a stimulus, making no assumptions about the shape of the 
response. As with any FIR Model, given the multiple regressors 
for each condition, there are several ways of summarizing the 
response to a condition, including an F-test over all 4 columns 
for a condition (asking: is there any effect at any time point?) 
or a t-test over all 4 columns (on average is there an increased 
response?). Similar options exist for comparing response between 
conditions. 

Because the ISSS scans are not continuous, care must be taken 
when implementing temporal filtering, including typical highpass 
filtering done on fMRI timeseries data. Omitting highpass filter- 
ing may make an analysis particularly susceptible to the influence 
of low- frequency (non-acoustic) noise. One way to help mitigate 
this issue is to ensure trials of different conditions are not too 
far apart in time so that comparisons across conditions are not 
confounded with low- frequency fluctuations in the signal. 

Analyzing ISSS fMRI data using dummy scans to mimic a 
continuous timeseries 

An alternative approach is to ensure that rows of the design matrix 
correspond to a continuous timeseries, illustrated in Figure 5B. 
To accomplish this, dummy volumes can be included in the design 
matrix during the period in which no data were actually collected. 
A straightforward option is to use the mean EPI image across 
all (real) volumes in a session, although any identical image will 
work: Using an identical image for all dummy images means that 
all dummy images can be perfectly modeled using a single regres- 
sor (0 for real scans, 1 for dummy scans). With this model it is 
then possible to use a canonical HRF (or any other basis set) for 



events of interest; the parameter estimates for these regressors are 
not influenced by the dummy scans. It is important to set the val- 
ues for the non- dummy regressors to zero during the dummy 
scans to preserve estimation of the parameter estimate, and to 
rescale the regressors so that the maximum values are matched 
after these adjustments. 

It is not actually necessary to use dummy scans in order to 
take advantage of timeseries properties, such as highpass filter- 
ing or using an informed basis function (e.g., a canonical HRF); 
an appropriate design matrix that takes into account the discon- 
tinuous nature of the data could be constructed. However, the use 
of dummy scans facilitates constructing design matrices within 
common fMRI analysis software packages, which are typically 
designed to work with continuous timeseries data. 

When dummy scans are included in the final design matrix, 
the default degrees of freedom in the model will be incorrectly 
high, as the dummy scans should not be counted as observations. 
Thus, for first-level (single subject) analyses, an adjustment to the 
degrees of freedom should be made for valid statistical inference. 
For group analyses using a two -stage summary statistics proce- 
dure, however, adjusting for first-level degrees of freedom is not 
necessary. 

ACTIVE NOISE CONTROL 

A different approach to reducing the impact of acoustic noise in 
the MRI scanner is to change the way this sound is perceived by 
listeners using active noise control (Hall et al., 2009). As typically 
implemented, active noise control involves measuring the prop- 
erties of the scanner noise, and generating a destructive acoustic 
signal (also known as "antinoise") which is sent to the head- 
phones that cancels a portion of the scanner noise (Chambers 
et al, 2001, 2007; Li et al, 2011). The destructive signal is based 
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on estimates of scanner noise that can either be fixed, or adjusted 
over the course of a scanning session to accommodate changes in 
the scanner noise. Adjusting over time may be important in the 
context of fMRI as subjects may move their heads over the course 
of a scanning session, which affects the acoustic characteristics of 
the noise reaching their ears. 

In addition to sound presentation hardware, active noise con- 
trol also requires an MR- compatible method for measuring the 
acoustic noise in the scanner, used to shape the destructive noise 
pulses. Whether sound is generated in the headset, or passed 
through a tube, the timing of this canceling sound is critical, as 
it must arrive with the specified phase relationship to the scanner 
noise. 

Active noise control can reduces the level of acoustic noise 
by 30-40 dB, and subjective loudness by 20 dB (the difference 
between these measures likely reflecting the contribution of 
bone conducted vibration) (Hall et al, 2009; Li et al, 2011). 
Particularly relevant is that when using relatively simple auditory 
stimuli (pure tone pitch discrimination), (1) behavioral perfor- 
mance in the scanner was significantly better and (2) activity in 
primary auditory regions was significantly greater under condi- 
tions of active noise control compared to normal presentation 
(Hall et al.,2009). 

USING CONTINUOUS fMRI SEQUENCES WITH REDUCED ACOUSTIC 
NOISE 

Software modifications to EPI sequences intended to reduce the 
effects of acoustic scanner noise can be broadly grouped into two 
approaches: changing the nature of the acoustic stimulation and 
reducing the overall sound levels. 

One approach to reducing sound levels of a standard EPI 
sequence is to modify the gradient pulse shape (Hennel et al., 
1999). Typically, gradient pulses are trapezoidal, to increase 
the speed and efficiency of gradient encoding. By using sinu- 
soidal pulses, acoustic noise can be reduced during BOLD fMRI 
(Loenneker et al., 2001), with some increase in the spatial 
smoothness of the reconstructed data. 

Building on the idea of modified pulse shape, another type of 
quiet fMRI sequence was introduced by Schmitter et al. (2008). 
Their quiet continuous EPI sequence takes advantage of two key 
modifications to reduce acoustic noise. The first involves collect- 
ing data using a sinusoidal traversal of k space, enabling more 
gradual gradient switching (readout gradients are purely sinu- 
soidal) and reducing the acoustic noise produced. The second 
modification addresses the fact that a large component of the 
acoustic noise during EPI comes from the resonance of the scan- 
ner hardware to the gradient switching. This reflects specific 
physical properties of each scanner, and varies across different 
speeds of gradient switching. Thus, it is possible to perform 
scanner- specific measurements of the acoustic noise generated 
for different readout gradient switching frequencies, and select a 
combination of parameters that is relatively quiet, but does not 
unacceptably compromise signal quality. In Peelle et al. (2010a), 
we chose a bandwidth of 1220 Hz/Px and an echo time of 44 ms 
(compared to a standard sequence with values of 2230 Hz/Px and 
30 ms, respectively). As might be expected, the longer echo time 
lead to moderate increases in signal dropout in regions prone to 



susceptibility artifact, such as inferior temporal and ventromedial 
frontal cortex. Together, these modifications produce approxi- 
mately a 20 dB reduction in acoustic noise for the scanner, and 
using this sequence results in greater activity in several auditory 
regions compared to a standard continuous sequence (Schmitter 
et al, 2008; Peelle et al, 2010a). 

Taking another approach to reducing the impact of scanner 
noise on observed activation, Seifritz et al. (2006) developed a 
continuous-sound EPI sequence to reduce the auditory stim- 
ulation caused by rapid acoustic pulses (Harms and Melcher, 
2002), as found in conventional EPI. In their sequence the RE 
excitation pulses, phase -encoding gradients, and readout gradi- 
ents are divided into short trains. The resulting repetition rate is 
fast enough that the acoustic noise is perceived as a continuous 
sound, rather than the pulsed sound perceived in conventional 
EPI. Using sparse imaging, the authors compared neural activity 
in response to audio recordings of conventional EPI compared to 
the "continuous sound" sequence. They found that the continu- 
ous sound sequence resulted in reduced activity in auditory cortex 
due to scanner noise, and increased activity to experimental 
manipulations. 

SCANNER HARDWARE MODIFICATION 

Although it may not be practical for most research groups to 
significantly modify scanner hardware, by changing the physical 
configuration of the MRI scanner it is possible to significantly 
reduce the amount of acoustic noise generated. Some approaches 
have included the use of rotating coils to reduce gradient switch- 
ing (Cho et al., 1998b), placing the gradient coils in a vacuum 
to reduce noise propagation (Katsunuma et al, 2001), or altering 
the coil structure (Mansfield and Haywood, 2000). By combin- 
ing multiple approaches and focusing on the largest contributors 
to acoustic noise, substantial reductions in noise levels can be 
achieved (Edelstein et al, 2002). In the future, commercial appli- 
cations of these approaches may help to limit the impact of 
scanner noise during fMRI, particularly when combined with 
some of the other solutions outlined above. 

AUDITORY fMRI IN NONHUMAN ANIMALS 

Although my focus has been on fMRI of human subjects, many 
of these same challenges and solutions apply equally when using 
fMRI with animals (Petkov et al, 2009; Hall et al, 2014). As 
with human listeners, the choice of scanning protocol will depend 
on a researcher's primary interests and the acceptable level of 
tradeoff between data quality, temporal resolution, and acous- 
tic noise. Although some concerns about attention and cognitive 
challenge may be mitigated when dealing with sedated animals, in 
the absence of empirical support it is probably not safe to assume 
that one protocol will prove optimal in all situations. In addition, 
the timing parameters of any non-continuous sequence will natu- 
rally need to be optimized for the HRF of the animal being studied 
(Brown et al, 2013). 

CHOOSING THE APPROPRIATE SOLUTION 

As discussed above, different solutions for auditory fMRI have 
intrinsic strengths and weaknesses, and thus any chosen approach 
involves a degree of compromise with respect to acoustic noise 
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FIGURE 6 I Choosing the best method for auditory fMRI involves 
considering a number of dimensions. These dimensions are not 
independent: for exannple, using a nnodified EPI sequence nnay change the 
properties of the MRI data, the acoustic properties of the scanning noise, 
and resulting innpact on psychological processes. The focus of optimization 
will depend on the acoustic characteristics of the stimuli and the neural 
processes of interest. 



(loudness or quality), psychological impact, and MRI data char- 
acteristics. It may be useful to think about this in a frame- 
work of multidimensional optimization, as illustrated in Figure 6. 
Because these dimensions are not independent, it is impossible to 
optimize for everything simultaneously (for example, approaches 
that have the lowest acoustic noise also tend to have poorer tem- 
poral resolution, forcing a researcher to choose between noise 
level and temporal resolution). It is therefore important to iden- 
Xi^Y the dimensions that are most important for a given study. 
These will depend on the specific stimuli and scientific question 
at hand. 

Although there are exceptions, as a general rule it is proba- 
bly safest to prioritize the auditory and psychological aspects of 
data collection. If the processing of stimuli is affected by scan- 
ner noise (through masking or increased perceptual effort), the 
resulting neural processing may differ from what the researchers 
are interested in. In this case increased image quality will not 
help in identif)^ing neural activity of interest. Thus, a sparse imag- 
ing sequence is nearly always preferable to continuous sequences 
because it presents the lowest level of background noise, and 
is straightforward to implement. If possible, an ISSS sequence 
presents an even stronger solution as it permits the presenta- 
tion of stimuli in relative quiet, while not sacrificing temporal 
resolution to the same degree as a traditional sparse sequence. 

When it is not feasible to present stimuli in the absence of scan- 
ner noise, considering the acoustic characteristics of the stimuli is 
critical. For example, if speech prosody, voice/speaker perception, 
or musical timber is of interest, spectral cues may be particularly 
important, and thus the spectrum of the scanner noise may be 
a deciding factor. In contrast, for other stimuli (such as musical 



beat, or other aspects of speech perception) temporal factors may 
dominate. 

That being said, from a practical standpoint the majority 
of researchers will be constrained by available sequences and 
equipment, and thus the most common choice will be between 
a continuous EPI sequence and a traditional sparse sequence. In 
this case, adapting a paradigm and stimuli to work with the sparse 
sequence is almost always a safer choice. 

RELYING ON CONVERGING EVIDENCE TO SUPPORT CONCLUSIONS 

Although it is no doubt important to optimize fMRI acquisition 
and analysis parameters for auditory studies, the strongest infer- 
ences will always be drawn based on converging evidence from 
multiple modalities. With respect to auditory processing, this 
includes functional neuroimaging methods that allow the mea- 
suring of neural response in the absence of external noise such 
as positron emission tomography (PET), electroencephalography 
(EEG), magnetoencelphalography (MEG), electrocorticography 
(ECoG), or optical imaging, as well as studying behavior in peo- 
ple with differences in brain structure (e.g., as a result of stroke or 
neurodegenerative disease). 

CONCLUSIONS AND RECOMMENDATIONS 

Echoplanar fMRI is acoustically noisy and poses considerable 
challenges for researchers interested in studying auditory pro- 
cessing. Although it is impossible to fully resolve the tension 
between the acoustic noise produced during fMRI and the desired 
experimental environment, the following steps will often be help- 
ful in optimizing auditory responses and our interpretation of 
them: 

(1) Address, rather than ignore, the possible effects of back- 
ground noise on activity seen in fMRI studies. Considering 
scanner noise is particularly important when using auditory 
stimuli, but may apply to non- auditory stimuli as well. 

(2) When possible, use methods that limit the impact of acoustic 
noise during fMRI scanning. 

(3) Provide empirical demonstrations of the effect of scanner 
noise on specific paradigms and analyses. 

It is an exciting time for auditory neuroscience, and continuing 
technical and methodological advances suggest an even brighter 
(though hopefully quieter) future. 
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